home *** CD-ROM | disk | FTP | other *** search
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- NAME
- sgmls - a validating SGML parser
-
- An SGML System Conforming to
- International Standard ISO 8879 --
- Standard Generalized Markup Language
-
- SYNOPSIS
- sgmls [ -acdeglrsuv ] [ -ffile ] [ -iname ] filename...
-
- DESCRIPTION
- Sgmls parses and validates the SGML document entity in
- filename... and prints on the standard output a simple
- ASCII representation of its Element Structure Information
- Set. (This is the information set which a structure-
- controlled conforming SGML application should act upon.)
-
- The following options are available:
-
- -a Detect and report ambiguous content models.
-
- -c Describe capacity usage at the end of the parse.
-
- -d Warn about duplicate entity declarations.
-
- -e Describe open entities in error messages. Error
- messages always include the position of the most
- recently opened external entity.
-
- -ffile Redirect errors to file.
-
- -g Show the GIs of open elements in error messages.
-
- -iname Pretend that
-
- <!ENTITY % name INCLUDE>
-
- occurs at the start of the document type declara-
- tion subset in the SGML document entity. Since
- repeated definitions of an entity are ignored, this
- definition will take precedence over any other def-
- initions of this entity in the document type decla-
- ration. Multiple -i options are allowed. If the
- SGML declaration replaces the reserved name INCLUDE
- then the new reserved name will be the replacement
- text of the entity. Typically the document type
- declaration will contain
-
- <!ENTITY % name IGNORE>
-
- and will use %name; in the status keyword specifi-
- cation of a marked section declaration. In this
- case the effect of the option will be to cause the
- marked section not to be ignored.
-
-
-
- 1
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- -l Output L commands giving the current line number
- and filename.
-
- -p Parse only the prolog. Sgmls will exit after pars-
- ing the document type declaration. Implies -s.
-
- -r Warn about defaulted references.
-
- -s Suppress output. Error messages will still be
- printed.
-
- -u Warn about undefined element: elements used in the
- DTD but not defined.
-
- -v Print the version number.
-
- Entity Manager
- An external entity resides in one or more files. The
- entity manager component of sgmls maps a sequence of files
- into an entity in three sequential stages:
-
- 1. each carriage return character is turned into a
- non-SGML character;
-
- 2. each newline character is turned into a record end
- character, and at the same time a record start
- character is inserted at the beginning of each
- line;
-
- 3. the files are concatenated.
-
- A system identifier is interpreted as a list of filenames
- separated by semi-colons. If no system identifier is sup-
- plied, then the entity manager will attempt to generate a
- filename using the public identifier (if there is one) and
- other information available to it. Notation identifiers
- are not subject to this treatment. This process is con-
- trolled by the environment variable SGML_PATH; this con-
- tains a semicolon-separated list of filename templates. A
- filename template is a filename that may contain substitu-
- tion fields; a substitution field is a % character fol-
- lowed by a single letter that indicates the value of the
- substitution. If SGML_PATH uses the %S field (the value
- of which is the system identifier), then the entity man-
- ager will also use SGML_PATH to generate a filename when a
- system identifier that does not contain any semi-colons is
- supplied. The value of a substitution can either be a
- string or it can be null. The entity manager transforms
- the list of filename templates into a list of filenames by
- substituting for each substitution field and discarding
- any template that contained a substitution field whose
- value was null. It then uses the first resulting filename
- that exists and is readable. Substitution values are
- transformed before being used for substitution: firstly,
-
-
-
- 2
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- any names that were subject to upper case substitution are
- folded to lower case; secondly, the characters +,./:=?
- and space characters are deleted. The value of the %S
- field is not transformed. The values of substitution
- fields are as follows:
-
- %% A single %.
-
- %D The entity's data content notation. This substitu-
- tion will succeed only for external data entities.
-
- %N The entity, notation or document type name.
-
- %P The public identifier if there was a public identi-
- fier, otherwise null.
-
- %S The system identifier if there was a system identi-
- fier otherwise null.
-
- %X (This is provided mainly for compatibility with
- ARCSGML.) A three-letter string chosen as follows:
- | |
- | | With public identifier
- | +-------------+-----------
- | No public | Device | Device
- | identifier | independent | dependent
- ---------------------------+------------+-------------+-----------
- Data or subdocument entity | nsd | pns | vns
- General SGML text entity | gml | pge | vge
- Parameter entity | spe | ppe | vpe
- Document type definition | dtd | pdt | vdt
- Link process definition | lpd | plp | vlp
-
- The device dependent version is selected if the
- public text class allows a public text display ver-
- sion but no public text display version was speci-
- fied.
-
- %Y The type of thing for which the filename is being
- generated:
- SGML subdocument entity sgml
- Data entity data
- General text entity text
- Parameter entity parm
- Document type definition dtd
- Link process definition lpd
-
- The value of the following substitution fields will be
- null unless a valid formal public identifier was supplied.
-
- %A Null if the text identifier in the formal public
- identifier contains an unavailable text indicator,
- otherwise the empty string.
-
-
-
-
- 3
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- %C The public text class, mapped to lower case.
-
- %E The public text designating sequence (escape
- sequence) if the public text class is CHARSET, oth-
- erwise null.
-
- %I The empty string if the owner identifier in the
- formal public identifier is an ISO owner identi-
- fier, otherwise null.
-
- %L The public text language, mapped to lower case,
- unless the public text class is CHARSET, in which
- case null.
-
- %O The owner identifier (with the +// or -// prefix
- stripped.)
-
- %R The empty string if the owner identifier in the
- formal public identifier is a registered owner
- identifier, otherwise null.
-
- %T The public text description.
-
- %U The empty string if the owner identifier in the
- formal public identifier is an unregistered owner
- identifier, otherwise null.
-
- %V The public text display version. This substitution
- will be null if the public text class does not
- allow a display version or if no version was speci-
- fied. If an empty version was specified, a value
- of default will be used.
-
- System declaration
- The system declaration for sgmls is as follows:
-
- SYSTEM "ISO 8879-1986"
- CHARSET
- BASESET "ISO 646-1983//CHARSET
- International Reference Version (IRV)//ESC 2/5 4/0"
- DESCSET 0 128 0
- CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN"
- FEATURES
- MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
- LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
- OTHER CONCUR NO SUBDOC YES 1 FORMAL YES
- SCOPE DOCUMENT
- SYNTAX PUBLIC "ISO 8879-1986//SYNTAX Reference//EN"
- SYNTAX PUBLIC "ISO 8879-1986//SYNTAX Core//EN"
- VALIDATE
- GENERAL YES MODEL YES EXCLUDE YES CAPACITY YES
- NONSGML YES SGML YES FORMAL YES
- SDIF
-
-
-
-
- 4
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- PACK NO UNPACK NO
-
- The memory usage of sgmls is not a function of the capac-
- ity points used by a document; however, sgmls can handle
- capacities significantly greater than the reference capac-
- ity set.
-
- In some environments, higher values may be supported for
- the SUBDOC parameter.
-
- Documents that do not use optional features are also sup-
- ported. For example, if FORMAL NO is specified in the
- SGML declaration, public identifiers will not be required
- to be valid formal public identifiers.
-
- Certain parts of the concrete syntax may be changed:
-
- The shunned character numbers can be changed.
-
- Uppercase substitution can be performed or not per-
- formed both for entity names and for other names.
-
- Either short reference delimiters assigned by the
- reference delimiter set or no short reference
- delimiters are supported.
-
- The reserved names can be changed.
-
- The quantity set can be increased within certain
- limits subject to there being sufficient memory
- available. The upper limit on NAMELEN is 239. The
- upper limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
- LITLEN, PILEN, TAGLEN, and TAGLVL are more than
- thity times greater than the reference limits. The
- upper limit on GRPCNT, GRPGTCNT, and GRPLVL is 253.
- NORMSEP cannot be changed. DTAGLEN are DTEMPLEN
- irrelevant since sgmls does not support the DATATAG
- feature.
-
- Ambiguous content models are reported (as specified by
- MODEL YES) only if the -a option is given.
-
- SGML declaration
- The SGML declaration may be omitted, the following decla-
- ration will be implied:
- <!SGML "ISO 8879-1986"
- CHARSET
- BASESET "ISO 646-1983//CHARSET
- International Reference Version (IRV)//ESC 2/5 4/0"
- DESCSET 0 9 UNUSED
- 9 2 9
- 11 2 UNUSED
- 13 1 13
-
-
-
-
- 5
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- 14 18 UNUSED
- 32 95 32
- 127 1 UNUSED
- CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN"
- SCOPE DOCUMENT
- SYNTAX PUBLIC "ISO 8879-1986//SYNTAX Reference//EN"
- FEATURES
- MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
- LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
- OTHER CONCUR NO SUBDOC YES 99999999 FORMAL YES
- APPINFO NONE>
- with the exception that characters 128 through 254 will be
- assigned to DATACHAR. When exporting documents that use
- characters in this range, an accurate description of the
- upper half of the document character set should be added
- to this declaration. For ISO Latin-1, an appropriate
- description would be:
- BASESET "ISO Registration Number 100//CHARSET
- ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
- DESCSET 128 32 UNUSED
- 160 95 32
- 255 1 UNUSED
-
- Output format
- The output is a series of lines. Lines can be arbitrarily
- long. Each line consists of an initial command character
- and one or more arguments. Arguments are separated by a
- single space, but when a command takes a fixed number of
- arguments the last argument can contain spaces. There is
- no space between the command character and the first argu-
- ment. Arguments can contain the following escape
- sequences.
-
- \\ A \.
-
- \n A record end character.
-
- \| Internal SDATA entities are bracketed by these.
-
- \nnn The character whose code is nnn octal.
-
- \s A space. This is used only in filenames or nota-
- tion identifiers that contain a space. (Filenames
- can occur in the S, E and L commands.)
-
- The possible command characters and arguments are as fol-
- lows:
-
- (gi The start of an element whose generic identifier is
- gi. Any attributes for this element will have been
- specified with A commands.
-
- )gi The end an element whose generic identifier is gi.
-
-
-
-
- 6
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- -data Data.
-
- &name A reference to an external data entity name; name
- will have been defined using an E command.
-
- ?pi A processing instruction with data pi.
-
- Aname val
- The next element to start has an attribute name
- with value val which takes one of the following
- forms:
-
- IMPLIED
- The value of the attribute is implied.
-
- CDATA data
- The attribute is character data. This is
- used for attributes whose declared value is
- CDATA.
-
- TOKEN token...
- The attribute is a list of tokens. This is
- used for attributes whose declared value is
- a name token group or one of NAME, NMTOKEN,
- NUTOKEN, NUMBER, NAMES, NMTOKENS, NUTOKENS
- or NUMBERS.
-
- NOTATION nname
- The attribute is a notation name; nname will
- have been defined using a N command. This
- is used for attributes whose declared value
- is NOTATION.
-
- ENTITY name...
- The attribute is a list of general entity
- names. Each entity name will have been
- defined using an I, E or S command. This is
- used for attributes whose declared value is
- ENTITY or ENTITIES.
-
- ID id The attribute is an id value. This is used
- for attributes whose declared value is ID.
-
- IDREF id...
- The attribute is a list of id references.
- This is used for attributes whose declared
- value is IDREF or IDREFS.
-
- Dename name val
- This is the same as the A command, except that it
- specifies a data attribute for an external entity
- named ename. Any D commands will come after the E
- command that defines the entity to which they
- apply, but before any & or A commands that
-
-
-
- 7
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- reference the entity.
-
- Nnname sysid pubid
- Define a notation nname associated with system
- identifier sysid, and public identifier pubid. If
- no system identifier was specified, the notation
- name will be used (after being transformed as if it
- were a substitution value). If no public identi-
- fier was specified, pubid will be omitted. A nota-
- tion will only be defined if it is to be referenced
- in an E command or in an A command for an attribute
- with a declared value of NOTATION.
-
- Eename typ nname filename...
- Define an external data entity named ename with
- type typ (CDATA, NDATA or SDATA), notation not and
- a list of files filename...; not will have been
- defined using a N command. Data attributes may be
- specified for the entity using D commands. An
- external data entity will only be defined if it is
- to be referenced in a & command or in an A command
- for an attribute whose declared value is ENTITY or
- ENTITIES.
-
- Iename typ text
- Define an internal data entity named ename with
- type typ (CDATA or SDATA) and entity text text. An
- internal data entity will only be defined if it is
- referenced in an A command for an attribute whose
- declared value is ENTITY or ENTITIES.
-
- Sename filename...
- Define a subdocument entity with a list of files
- filename.... Such an entity will only be defined
- if it is referenced in an A command for an
- attribute whose declared value is ENTITY or ENTI-
- TIES.
-
- {ename The start of the SGML subdocument entity ename.
-
- }ename The end of the SGML subdocument entity ename.
-
- Llineno file
- Llineno
- Set the current line number and filename. The
- filename argument will be omitted if only the line
- number has changed. This will be output only if
- the -l option has been given.
-
- #text An APPINFO parameter of text was specified in the
- SGML declaration. This is not strictly part of the
- ESIS, but a structure-controlled application is
- permitted to act on it. No # command will be out-
- put if APPINFO NONE was specified. A # command
-
-
-
- 8
-
-
-
-
-
- SGMLS(1) SGMLS(1)
-
-
- will occur at most once, and may be preceded only
- by a single L command.
-
- BUGS
- Non-SGML characters in literals are counted as two charac-
- ters for the purposes of quantity and capacity calcula-
- tions.
-
- No error message is given for character references or
- marked section declarations between the end of the DTD and
- the start of the document element.
-
- SEE ALSO
- The SGML Handbook, Charles F. Goldfarb
- ISO 8879 (Standard Generalized Markup Language), Interna-
- tional Organization for Standardization
-
- ORIGIN
- ARCSGML was written by Charles F. Goldfarb.
-
- Sgmls was derived from ARCSGML by James Clark
- (jjc@jclark.com), to whom bugs should be reported.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 9
-
-
-